AITopics | relevance label

2509.16717

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Neural Information Processing SystemsNov-18-2025, 08:23:43 GMT

RD-Suite: A Benchmark for Ranking Distillation

Moreover, inconsistent bench-marking on a wide range of tasks and datasets make it difficult to assess or invigorate advances in this field.

distillation, machine learning, natural language, (16 more...)

Country: North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre: Overview (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Neural Information Processing SystemsOct-10-2025, 23:20:55 GMT

RD-Suite: A Benchmark for Ranking Distillation

Moreover, inconsistent bench-marking on a wide range of tasks and datasets make it difficult to assess or invigorate advances in this field.

distillation, machine learning, natural language, (16 more...)

Country: North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre: Overview (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Peshevski, Dimitar, Blazhevski, Kiril, Popovski, Martin, Madjarov, Gjorgji

Enhancing Transformer-Based Rerankers with Synthetic Data and LLM-Based Supervision

arXiv.org Artificial IntelligenceOct-3-2025

Effective document reranking is essential for improving search relevance across diverse applications. While Large Language Models (LLMs) excel at reranking due to their deep semantic understanding and reasoning, their high computational cost makes them impractical for many real-world deployments. Fine-tuning smaller, task-specific models is a more efficient alternative but typically depends on scarce, manually labeled data. To overcome this, we propose a novel pipeline that eliminates the need for human-labeled query-document pairs. Our method uses LLMs to generate synthetic queries from domain-specific corpora and employs an LLM-based classifier to label positive and hard-negative pairs. This synthetic dataset is then used to fine-tune a smaller transformer model with contrastive learning using Localized Contrastive Estimation (LCE) loss. Experiments on the MedQuAD dataset show that our approach significantly boosts in-domain performance and generalizes well to out-of-domain tasks. By using LLMs for data generation and supervision rather than inference, we reduce computational costs while maintaining strong reranking capabilities.

large language model, machine learning, natural language, (20 more...)

2510.01229

Country:

Europe > North Macedonia > Skopje Statistical Region > Skopje Municipality > Skopje (0.04)
Asia > China (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsSep-25-2025, 23:32:04 GMT

b5200c6107fc3d41d19a2b66835c3974-Paper.pdf

artificial intelligence, machine learning, relaxation, (17 more...)

Country: North America > United States > California (0.46)

Genre: Research Report (0.46)

Industry: Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceSep-19-2025

CSRM-LLM: Embracing Multilingual LLMs for Cold-Start Relevance Matching in Emerging E-commerce Markets

Wang, Yujing, Chen, Yiren, Li, Huoran, Xu, Chunxu, Luo, Yuchong, Mao, Xianghui, Li, Cong, Du, Lun, Ma, Chunyang, Jiang, Qiqi, Wang, Yin, Gao, Fan, Mo, Wenting, Wen, Pei, Kumar, Shantanu, Park, Taejin, Song, Yiwei, Rajaram, Vijay, Cheng, Tao, Durgia, Sonu, Kolari, Pranam

As global e-commerce platforms continue to expand, companies are entering new markets where they encounter cold-start challenges due to limited human labels and user behaviors. In this paper, we share our experiences in Coupang to provide a competitive cold-start performance of relevance matching for emerging e-commerce markets. Specifically, we present a Cold-Start Relevance Matching (CSRM) framework, utilizing a multilingual Large Language Model (LLM) to address three challenges: (1) activating cross-lingual transfer learning abilities of LLMs through machine translation tasks; (2) enhancing query understanding and incorporating e-commerce knowledge by retrieval-based query augmentation; (3) mitigating the impact of training label errors through a multi-round self-distillation training strategy. Our experiments demonstrate the effectiveness of CSRM-LLM and the proposed techniques, resulting in successful real-world deployment and significant online gains, with a 45.8% reduction in defect ratio and a 0.866% uplift in session purchase rate.

large language model, machine learning, natural language, (17 more...)

2509.01566

Country:

Asia > China > Beijing > Beijing (0.05)
North America > United States > California > Santa Clara County > Mountain View (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Services > e-Commerce Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsAug-17-2025, 00:37:33 GMT

PiRank: Scalable Learning To Rank via Differentiable Sorting

The preference over items is specified via relevance labels for each candidate.

artificial intelligence, machine learning, relaxation, (17 more...)

Country: North America > United States > California (0.46)

Genre: Research Report (0.46)

Industry: Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceAug-15-2025

Personalized Product Search Ranking: A Multi-Task Learning Approach with Tabular and Non-Tabular Data

Morishetti, Lalitesh, Kumar, Abhay, Scott, Jonathan, Nag, Kaushiki, Sharma, Gunjan, Vashishtha, Shanu, Sridhar, Rahul, Chatter, Rohit, Achan, Kannan

In this paper, we present a novel model architecture for optimizing personalized product search ranking using a multi-task learning (MTL) framework. Our approach uniquely integrates tabular and non-tabular data, leveraging a pre-trained TinyBERT model for semantic embeddings and a novel sampling technique to capture diverse customer behaviors. We evaluate our model against several baselines, including XGBoost, TabNet, FT-Transformer, DCN-V2, and MMoE, focusing on their ability to handle mixed data types and optimize personalized ranking. Additionally, we propose a scalable relevance labeling mechanism based on click-through rates, click positions, and semantic similarity, offering an alternative to traditional human-annotated labels. Experimental results show that combining non-tabular data with advanced embedding techniques in multi-task learning paradigm significantly enhances model performance. Ablation studies further underscore the benefits of incorporating relevance labels, fine-tuning TinyBERT layers, and TinyBERT query-product embedding interactions. These results demonstrate the effectiveness of our approach in achieving improved personalized product search ranking.

artificial intelligence, deep learning, machine learning, (18 more...)

2508.09636

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceAug-5-2025

FinCPRG: A Bidirectional Generation Pipeline for Hierarchical Queries and Rich Relevance in Financial Chinese Passage Retrieval

Xu, Xuan, Chu, Beilin, Lin, Qinhong, Zhong, Yixiao, Wen, Fufang, Liu, Jiaqi, Fei, Binjie, Li, Yu, Yang, Zhongliang, Zhou, Linna

In recent years, large language models (LLMs) have demonstrated significant potential in constructing passage retrieval datasets. However, existing methods still face limitations in expressing cross-doc query needs and controlling annotation quality. To address these issues, this paper proposes a bidirectional generation pipeline, which aims to generate 3-level hierarchical queries for both intra-doc and cross-doc scenarios and mine additional relevance labels on top of direct mapping annotation. The pipeline introduces two query generation methods: bottom-up from single-doc text and top-down from multi-doc titles. The bottom-up method uses LLMs to disassemble and generate structured queries at both sentence-level and passage-level simultaneously from intra-doc passages. The top-down approach incorporates three key financial elements--industry, topic, and time--to divide report titles into clusters and prompts LLMs to generate topic-level queries from each cluster. For relevance annotation, our pipeline not only relies on direct mapping annotation from the generation relationship but also implements an indirect positives mining method to enrich the relevant query-passage pairs. Using this pipeline, we constructed a Financial Passage Retrieval Generated dataset (FinCPRG) from almost 1.3k Chinese financial research reports, which includes hierarchical queries and rich relevance labels. Through evaluations of mined relevance labels, bench-marking and training experiments, we assessed the quality of FinCPRG and validated its effectiveness as a passage retrieval dataset for both training and benchmarking.

large language model, machine learning, natural language, (17 more...)

2508.02222

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report > New Finding (0.49)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Ingram, William A., Banerjee, Bipasha, Fox, Edward A.

When LLMs Disagree: Diagnosing Relevance Filtering Bias and Retrieval Divergence in SDG Search

arXiv.org Artificial IntelligenceJul-4-2025

Large language models (LLMs) are increasingly used to assign document relevance labels in information retrieval pipelines, especially in domains lacking human-labeled data. However, different models often disagree on borderline cases, raising concerns about how such disagreement affects downstream retrieval. This study examines labeling disagreement between two open-weight LLMs, LLaMA and Qwen, on a corpus of scholarly abstracts related to Sustainable Development Goals (SDGs) 1, 3, and 7. We isolate disagreement subsets and examine their lexical properties, rank-order behavior, and classification predictability. Our results show that model disagreement is systematic, not random: disagreement cases exhibit consistent lexical patterns, produce divergent top-ranked outputs under shared scoring functions, and are distinguishable with AUCs above 0.74 using simple classifiers. These findings suggest that LLM-based filtering introduces structured variability in document retrieval, even under controlled prompting and shared ranking logic. We propose using classification disagreement as an object of analysis in retrieval evaluation, particularly in policy-relevant or thematic search tasks.

large language model, machine learning, natural language, (17 more...)